Manifold-based multi-objective policy search with sample reuse

نویسندگان

  • Simone Parisi
  • Matteo Pirotta
  • Jan Peters
چکیده

Many real-world applications are characterized by multiple conflicting objectives. In such problems optimality is replaced by Pareto optimality and the goal is to find the Pareto frontier, a set of solutions representing different compromises among the objectives. Despite recent advances in multi-objective optimization, achieving an accurate representation of the Pareto frontier is still an important challenge. Building on recent advances in reinforcement learning and multi-objective policy search, we present two novel manifold-based algorithms to solve multi-objective Markov decision processes. These algorithms combine episodic exploration strategies and importance sampling to efficiently learn a manifold in the policy parameter space such that its image in the objective space accurately approximates the Pareto frontier. We show that episode-based approaches and importance sampling can lead to significantly better results in the context of multi-objective reinforcement learning. Evaluated on three multi-objective problems, our algorithms outperform state-of-the-art methods both in terms of quality of the learned Pareto frontier and sample efficiency.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Integrated Supply Chain of After-sales Services Model: A Multi-objective Scatter Search Optimization Approach

Abstract: In recent decades, high profits of extended warranty have caused that third-party firms consider it as a lucrative after-sales service. However, customers division in terms of risk aversion and effect of offering extended warranty on manufacturers’ basic warranty should be investigated through adjusting such services. Since risk-averse customers welcome extended warranty, while the cu...

متن کامل

Reward-Weighted Regression with Sample Reuse for Direct Policy Search in Reinforcement Learning

Direct policy search is a promising reinforcement learning framework, in particular for controlling continuous, high-dimensional systems. Policy search often requires a large number of samples for obtaining a stable policy update estimator, and this is prohibitive when the sampling cost is expensive. In this letter, we extend an expectation-maximization-based policy search method so that previo...

متن کامل

Efficient Sample Reuse in EM-Based Policy Search

Direct policy search is a promising reinforcement learning framework in particular for controlling in continuous, high-dimensional systems such as anthropomorphic robots. Policy search often requires a large number of samples for obtaining a stable policy update estimator due to its high flexibility. However, this is prohibitive when the sampling cost is expensive. In this paper, we extend an E...

متن کامل

Multi-Stage Fuzzy Load Frequency Control Based on Multi-objective Harmony Search Algorithm in Deregulated Environment

A new Multi-Stage Fuzzy (MSF) controller based on Multi-objective Harmony Search Algorithm (MOHSA) is proposed in this paper to solve the Load Frequency Control (LFC) problem of power systems in deregulated environment. LFC problem are caused by load perturbations, which continuously disturb the normal operation of power system. The objectives of LFC are to mini small size the transient deviati...

متن کامل

A stochastic network design of bulky waste recycling – a hybrid harmony search approach based on sample approximation

Facing supply uncertainty of bulky wastes, the capacitated multi-product stochastic network design model for bulky waste recycling is proposed in this paper. The objective of this model is to minimize the first-stage total fixed costs and the expected value of the second-stage variable costs. The possibility of operation costs and transportation costs for bulky waste recycling is considered ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Neurocomputing

دوره 263  شماره 

صفحات  -

تاریخ انتشار 2017